ATOM Documentation

← Back to App

Gap Analysis: External API Integrations for Costs, Health, and Benchmarks

**Date:** 2026-04-01

**Comparison:** atom-saas (SaaS) vs atom-upstream (Open Source)

**Scope:** Costs, Provider Health, Benchmark Data

---

Executive Summary

**Key Finding:** Upstream has **DynamicPricingFetcher** that integrates with **LiteLLM** and **OpenRouter APIs** for real-time cost data. SaaS uses hardcoded costs. Neither uses external APIs for health monitoring or benchmarks.

**Critical Gap:** SaaS is missing the DynamicPricingFetcher integration, meaning:

  • Pricing updates require code changes
  • No automatic price syncing when providers change rates
  • Missing cache-aware routing features
  • No prompt caching optimization data

---

1. Cost Tracking

✅ Upstream (atom-upstream)

**File:** atom-upstream/backend/core/dynamic_pricing_fetcher.py

**External APIs:**

  1. **LiteLLM GitHub** - https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json
  • Fetches comprehensive pricing database
  • Updated regularly by LiteLLM community
  • Includes 100+ models across all providers
  1. **OpenRouter API** - https://openrouter.ai/api/v1/models
  • Real-time model pricing
  • Provider: OpenRouter
  • Fallback when LiteLLM data is missing

**Features:**

class DynamicPricingFetcher:
    async def refresh_pricing(self, force: bool = False) -> Dict[str, Any]:
        # Fetch from both sources
        litellm_pricing = await self.fetch_litellm_pricing()
        openrouter_pricing = await self.fetch_openrouter_pricing()

        # Merge pricing (LiteLLM takes precedence)
        self.pricing_cache = {**openrouter_pricing, **litellm_pricing}

        # Save to cache (24 hour TTL)
        self._save_cache()

**Cache Strategy:**

  • Local file cache: ./data/ai_pricing_cache.json
  • 24-hour TTL before refresh
  • Singleton pattern for efficiency

**Advanced Features:**

  • model_supports_cache(model_name) - Check if model supports prompt caching
  • get_cache_min_tokens(model_name) - Minimum tokens for caching (1024 OpenAI, 2048 Anthropic)
  • is_pricing_estimated(model_name) - Distinguish official vs estimated pricing
  • get_cheapest_models(limit) - Find lowest-cost models
  • compare_providers() - Compare average costs across providers

**Usage in Upstream:**

# Integrated into BYOKHandler and routing logic
from core.dynamic_pricing_fetcher import get_pricing_fetcher

fetcher = get_pricing_fetcher()
pricing = await fetcher.refresh_pricing()
cost = fetcher.estimate_cost("gpt-4o", 1000, 500)

❌ SaaS (atom-saas)

**Files:**

  • backend-saas/core/llm/embedding/providers.py - Embedding costs
  • backend-saas/core/llm/byok_handler.py - LLM routing (hardcoded)

**Current Implementation:**

# backend-saas/core/llm/embedding/providers.py
MODELS = {
    "text-embedding-3-small": {
        "cost_per_1m_tokens": 0.02,  # ❌ HARDCODED
    },
    "text-embedding-3-large": {
        "cost_per_1m_tokens": 0.13,  # ❌ HARDCODED
    },
    # ... similar for Cohere, Voyage, Nomic, Jina
}

**Problems:**

  • Pricing becomes outdated when providers change rates
  • Requires code deployment to update costs
  • No automatic synchronization
  • No cache-aware routing optimizations

**Missing Features:**

  • ❌ Dynamic pricing updates from external APIs
  • ❌ LiteLLM integration
  • ❌ OpenRouter fallback
  • ❌ Prompt caching support detection
  • ❌ Cost comparison across providers
  • ❌ Cheapest model discovery

---

2. Provider Health Monitoring

✅ Both Implementations (Similar)

**Upstream:** atom-upstream/backend/core/provider_health_monitor.py

**SaaS:** backend-saas/core/llm/registry/provider_health.py

**Implementation:** Both use **INTERNAL** tracking (no external APIs)

**Similar Features:**

  • Success/error rate tracking
  • Latency monitoring (rolling average)
  • Consecutive failure detection
  • Health score calculation (0.0-1.0 scale)

**Upstream (ProviderHealthMonitor):**

class ProviderHealthMonitor:
    def record_call(self, provider_id: str, success: bool, latency_ms: float):
        # Track in sliding window (5 minutes default)
        history.append((timestamp, success, latency_ms))

        # Calculate health: 70% success_rate + 30% latency_score
        health_score = (success_rate * 0.7) + (latency_score * 0.3)

**SaaS (ProviderHealthService):**

class ProviderHealthService:
    async def record_success(self, provider: str, latency_ms: float):
        # Track in Redis with 1-hour TTL
        # Rolling average latency calculation
        # Health state transitions (HEALTHY/DEGRADED/UNHEALTHY)

**Key Difference:**

  • Upstream: In-memory deque with sliding window (prevents memory leaks)
  • SaaS: Redis-backed with tenant isolation

**Neither uses external APIs** for:

  • ❌ Provider status pages (e.g., status.openai.com)
  • ❌ Uptime monitoring services
  • ❌ Third-party health check APIs

---

3. Benchmark Data

❌ Both Implementations (Identical)

**Upstream:** atom-upstream/backend/core/benchmarks.py

**SaaS:** backend-saas/core/benchmarks.py

**Implementation:** Both use **STATIC HARDCODED** scores

**Source:**

"""
Curated Quality Scores for AI Models
Normalized 0-100 scale based on MMLU, GSM8K, HumanEval, and LMSYS Chatbot Arena.
Updated Jan 2026
"""
MODEL_QUALITY_SCORES = {
    "gemini-3-pro": 100,      # ❌ HARDCODED
    "gpt-5": 99,              # ❌ HARDCODED
    "claude-4-opus": 99,       # ❌ HARDCODED
    # ... 50+ models
}

**Problems:**

  • Scores become outdated as new models are released
  • Manual updates required when benchmarks change
  • No automatic synchronization with leaderboard APIs

**Missing Features:**

  • ❌ LMSYS Chatbot Arena API integration
  • ❌ MMLU/GSM8K/HumanEval API integration
  • ❌ Automatic benchmark updates
  • ❌ Real-time leaderboard polling

**Note:** This is understandable since benchmark leaderboards don't always have public APIs, and manual curation provides quality control.

---

4. Feature Comparison Table

FeatureUpstreamSaaSGap
**Cost Tracking**
Dynamic pricing via LiteLLM API**HIGH PRIORITY**
Dynamic pricing via OpenRouter API**MEDIUM**
Local pricing cache (24h TTL)**HIGH**
Prompt caching support detection**MEDIUM**
Cache min-threshold tracking**LOW**
Provider cost comparison**MEDIUM**
Cheapest model discovery**LOW**
Estimated pricing flags**LOW**
**Health Monitoring**
Internal success/error trackingNone
Internal latency trackingNone
Sliding window (5min)❌ (1h TTL in Redis)**LOW**
External provider status page checks**FUTURE**
**Benchmarks**
Static quality scoresNone
Manual curationNone
External leaderboard APIs**FUTURE**

---

5. Impact Analysis

Business Impact

**SaaS Gaps:**

  1. **Stale Pricing** - If OpenAI/Anthropic change prices, SaaS customers continue paying estimated costs until code is deployed
  2. **Missed Savings** - No cache-aware routing optimization (missing 90% cost reduction potential)
  3. **Manual Updates** - DevOps required to update pricing in code

**Upstream Advantages:**

  1. **Auto-Updating** - Pricing updates every 24 hours from LiteLLM (community-maintained)
  2. **Cost Optimization** - Can route to cheapest models dynamically
  3. **Cache Savings** - Prompt caching support reduces costs by 90% for applicable models

Technical Debt

**SaaS Technical Debt:**

  • Missing dynamic_pricing_fetcher.py (~400 lines)
  • No pricing cache infrastructure
  • BYOKHandler doesn't use dynamic pricing
  • No cache-aware routing in LLM service

**Estimated Effort to Port:**

  • Copy dynamic_pricing_fetcher.py → 2 hours
  • Remove SaaS-specific patterns → 1 hour
  • Integrate with BYOKHandler → 2 hours
  • Add pricing refresh cron job → 1 hour
  • Testing & validation → 2 hours
  • **Total: ~8 hours**

---

6. Recommendations

Priority 1: Port DynamicPricingFetcher (HIGH VALUE)

**Actions:**

  1. Copy atom-upstream/backend/core/dynamic_pricing_fetcher.py to SaaS
  2. Remove hard-coded costs from embedding providers
  3. Integrate with BYOKHandler for cost estimation
  4. Add background task to refresh pricing every 24 hours
  5. Add tenant isolation to pricing cache (multi-tenancy requirement)

**Benefits:**

  • Automatic pricing updates (no code deployments needed)
  • Access to 100+ models with accurate pricing
  • Cache-aware routing for 90% cost savings
  • Provider cost comparison for optimization

Priority 2: Enhance Health Monitoring (MEDIUM VALUE)

**Actions:**

  1. Consider porting upstream's sliding window approach (prevents memory leaks)
  2. Add external provider status page checks (optional enhancement)
  • OpenAI: https://status.openai.com/api/v2/status.json
  • Anthropic: (no public API, but could scrape status page)
  • Google: https://status.cloud.google.com

**Benefits:**

  • Proactive provider health detection
  • Faster recovery from provider outages
  • Better routing decisions with real-time data

Priority 3: Benchmark Updates (LOW VALUE)

**Actions:**

  1. Keep manual curation (quality control)
  2. Set quarterly review schedule to update benchmarks
  3. Consider adding "last_updated" timestamp to track freshness

**Rationale:**

  • Benchmark leaderboards don't always have public APIs
  • Manual curation prevents low-quality data from entering system
  • Upstream uses same approach, suggesting this is acceptable

---

7. Implementation Plan

Phase 1: Port DynamicPricingFetcher

**Tasks:**

  1. Copy dynamic_pricing_fetcher.py from upstream
  2. Add SaaS-specific modifications:
  • Remove local file cache (use Redis instead)
  • Add tenant_id isolation for pricing queries
  • Add tenant-scoped pricing overrides (enterprise feature)
  1. Update embedding providers to use dynamic pricing
  2. Integrate with BYOKHandler
  3. Add pricing refresh cron job (Celery task)
  4. Write unit tests

**Estimated Time:** 1-2 days

Phase 2: Cache-Aware Routing

**Tasks:**

  1. Add prompt caching support to BYOKHandler
  2. Implement cache min-threshold checks
  3. Update routing logic to prefer cached models
  4. Track cache hit/miss metrics
  5. Add cost savings analytics

**Estimated Time:** 2-3 days

Phase 3: Health Monitoring Enhancements (Optional)

**Tasks:**

  1. Port sliding window approach from upstream
  2. Add provider status page polling (OpenAI, Anthropic)
  3. Update health score calculation to include external status
  4. Add alerting for provider degradation

**Estimated Time:** 1-2 days

---

8. Risk Assessment

Risks of Porting DynamicPricingFetcher

**Low Risk:**

  • Well-tested code from upstream
  • No breaking changes to existing APIs
  • Cache fallback if external APIs fail

**Medium Risk:**

  • Dependency on external GitHub/OpenRouter availability
  • Rate limiting on external APIs
  • Need to handle API failures gracefully

**Mitigations:**

  • Use 24-hour cache (retries have 24 hours to succeed)
  • Store fallback pricing in database
  • Graceful degradation to hardcoded costs if APIs fail
  • Monitor API call success rates

SaaS-Specific Considerations

**Multi-Tenancy:**

  • Pricing cache should be global (not per-tenant)
  • Enterprise tenants may have custom pricing overrides
  • Consider pricing tiers for different plans

**Billing:**

  • Dynamic pricing affects cost estimates
  • Need to track actual vs estimated costs
  • Consider margin protection (pricing updates shouldn't break margins)

---

9. Testing Strategy

Unit Tests

# Test dynamic pricing fetcher
async def test_fetch_litellm_pricing():
    fetcher = DynamicPricingFetcher()
    pricing = await fetcher.fetch_litellm_pricing()
    assert "gpt-4o" in pricing
    assert pricing["gpt-4o"]["input_cost_per_token"] > 0

async def test_cache_expiration():
    fetcher = DynamicPricingFetcher()
    # Test 24-hour cache logic
    assert fetcher._is_cache_valid() == True

async def test_external_api_failure():
    # Test graceful degradation when APIs are down
    fetcher = DynamicPricingFetcher()
    # Mock API failures
    pricing = await fetcher.refresh_pricing()
    # Should return cached pricing or empty dict

Integration Tests

async def test_byok_uses_dynamic_pricing():
    # Verify BYOKHandler uses DynamicPricingFetcher
    handler = BYOKHandler(tenant_id="test")
    cost = await handler.estimate_cost("gpt-4o", 1000, 500)
    assert cost > 0

async def test_pricing_refresh_background_task():
    # Test Celery task for pricing refresh
    # Verify pricing updates every 24 hours
    pass

---

10. Next Steps

Immediate Actions

  1. **Review upstream implementation** - Read atom-upstream/backend/core/dynamic_pricing_fetcher.py fully
  2. **Assess SaaS requirements** - Confirm tenant isolation, billing, and quota needs
  3. **Create implementation plan** - Detailed tasks with acceptance criteria
  4. **Get approval** - Present gap analysis to stakeholders for prioritization

If approved, start with **Phase 1: Port DynamicPricingFetcher** as it provides the highest business value with manageable risk.

**Files to Copy:**

  • atom-upstream/backend/core/dynamic_pricing_fetcher.py

**Files to Modify:**

  • backend-saas/core/llm/embedding/providers.py (remove hardcoded costs)
  • backend-saas/core/llm/byok_handler.py (integrate dynamic pricing)
  • backend-saas/main_api_app.py (add pricing refresh endpoint)

**New Files:**

  • backend-saas/tests/unit/test_dynamic_pricing_fetcher.py
  • backend-saas/core/tasks/pricing_refresh_task.py (Celery task)

---

Appendix: Code Samples

Example: Dynamic Pricing Integration

# Current SaaS (hardcoded)
class OpenAIEmbeddingProvider(BaseEmbeddingProvider):
    MODELS = {
        "text-embedding-3-small": {
            "cost_per_1m_tokens": 0.02,  # Hardcoded
        },
    }

# After Porting (dynamic)
class OpenAIEmbeddingProvider(BaseEmbeddingProvider):
    def __init__(self, api_key: str = None):
        super().__init__(api_key)
        from core.dynamic_pricing_fetcher import get_pricing_fetcher
        self.pricing_fetcher = get_pricing_fetcher()

    def estimate_cost(self, text: str, model: str) -> float:
        pricing = self.pricing_fetcher.get_model_price(model)
        if pricing:
            tokens = self._estimate_tokens(text)
            input_cost = pricing["input_cost_per_token"] * tokens
            return input_cost
        # Fallback to hardcoded cost
        return super().estimate_cost(text, model)

Example: Cache-Aware Routing

# After Phase 2 implementation
from core.dynamic_pricing_fetcher import get_pricing_fetcher

fetcher = get_pricing_fetcher()

# Check if model supports caching
if fetcher.model_supports_cache("gpt-4o"):
    min_tokens = fetcher.get_cache_min_tokens("gpt-4o")
    if estimated_tokens >= min_tokens:
        # Use gpt-4o for 90% cost savings
        return "gpt-4o"
else:
    # Use non-cached model
    return "gpt-4o-mini"

---

Conclusion

**Critical Gap:** SaaS is missing DynamicPricingFetcher that provides real-time cost data from LiteLLM and OpenRouter APIs.

**Impact:** High business value - automatic pricing updates, cache-aware routing, cost optimization.

**Recommendation:** Port dynamic_pricing_fetcher.py from upstream as Phase 1, followed by cache-aware routing in Phase 2.

**Estimated Effort:** 3-5 days total for full implementation including testing.

---

*Generated: 2026-04-01*

*Comparison: atom-saas (SaaS) vs atom-upstream (Open Source)*

*Focus: External API integrations for costs, health monitoring, and benchmarks*